Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply universal gemm to bwd_weight_cshuffle operator #1658

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

mozga-amd
Copy link
Contributor

No description provided.

@mozga-amd mozga-amd force-pushed the mozga-amd/universal_gemm_weight branch from 118fd3f to 932e6a0 Compare November 12, 2024 15:39
Copy link
Contributor

@bartekxk bartekxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue review later

Copy link
Contributor

@bartekxk bartekxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mozga-amd please remove merged groups support for this kernel. Then please update instances. After that we can review it again

@mozga-amd mozga-amd force-pushed the mozga-amd/universal_gemm_weight branch from 932e6a0 to 860433e Compare December 16, 2024 12:03
@mozga-amd mozga-amd marked this pull request as ready for review December 16, 2024 12:05
spolifroni-amd
spolifroni-amd previously approved these changes Dec 16, 2024
Copy link
Contributor

@spolifroni-amd spolifroni-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to review for docs. If anything needs to be documented, let me know.

@aosewski aosewski requested a review from bartekxk December 17, 2024 09:50
Copy link
Contributor

@bartekxk bartekxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comment. Please focus firstly on:
Regarding API changes, we have two solutions but please make some local measurements:

  • If gridwise gemm v3 is better for each case can we move BlkGemmPipeSched and BlgGemmPipVer at the end and keep api consistent?

  • If gridwise gemm v3 is better but not for each, then can we restore previous implementation and copy your new DeviceGroupedConvBwdWeight_Xdl_CShuffle as DeviceGroupedConvBwdWeight_Xdl_CShuffleV3?

ck::BlockGemmPipelineVersion::v1, // BlkGemmPipelineVer
ComputeTypeA, // ComputeTypeA
ComputeTypeB>; // ComputeTypeB
// clang-format on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding API changes, we have two solutions but please make some local measurements:

  1. If gridwise gemm v3 is better for each case can we move BlkGemmPipeSched and BlgGemmPipVer at the end and keep api consistent?
  2. If gridwise gemm v3 is better but not for each, then can we restore previous implementation and copy your new DeviceGroupedConvBwdWeight_Xdl_CShuffle as DeviceGroupedConvBwdWeight_Xdl_CShuffleV3?

const index_t K,
const std::array<index_t, NDimSpatial + 3>& output_strides)
{
const index_t BatchStride = output_strides[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For, out, in and wei. Can we add some condintion if (NumGroupsToMerge == 1) {/* Create descriptor in typical way (like in v1) ?} Or if possible use transform_v1 in device_op? I think it could inpact on performance

@mozga-amd mozga-amd requested a review from afagaj as a code owner January 6, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants